Detection of Documentary Scene Changes by Audio-Visual Fusion

نویسندگان

  • Atulya Velivelli
  • Chong-Wah Ngo
  • Thomas S. Huang
چکیده

The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IEEE Transactions on Multimedia EDICS: 4-SEGM Enhanced Eigen-audioframes for Audio-visual Scene Change Detection

In this paper, a novel audio-visual scene change detection algorithm is presented and evaluated experimentally. An enhanced set of eigen-audioframes is created that is related to an audio signal subspace, where audio background changes are easily discovered. An analysis is presented that justifies why this subspace favors scene change detection. Additionally, a novel process is developed in ord...

متن کامل

RUCMM at MediaEval 2015 Affective Impact of Movies Task: Fusion of Audio and Visual Cues

This paper summarizes our efforts for the first time participation in the Violent Scene Detection subtask of the MediaEval 2015 Affective Impact of Movies Task. We build violent scene detectors using both audio and visual cues. In particular, the audio cue is represented by bag-of-audio-words with fisher vector encoding. The visual cue is exploited by extracting CNN features from video frames. ...

متن کامل

Scene Understanding through Audio-Visual Fusion

Scene understanding involves the integration of a wide variety of information to produce a through description of the robot's environment. By integrating spatial, visual and audio cues, we could provide a greater amount of understanding than can be obtained using one of the modalities alone. In this paper, we describe our current work on using audition to enhance existing object detection and t...

متن کامل

A multimedia content modeling and classification methodology using visual information for the protection of sensitive user groups

The thesis concerns the problems of visual tracking and violence detection in video sequences. For the visual tracking problem, two feature fusion frameworks are presented. For violence detection, a system that classifies movie segments as violent or non-violent is proposed. The first tracking framework called ’Model Fusion via Proposal’ (MFP) framework, provides a way to efficiently fuse visua...

متن کامل

Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies

In this paper we present our research results towards the detection of violent scenes in movies, employing advanced fusion methodologies, based on learning, knowledge representation and reasoning. Towards this goal, a multi-step approach is followed: initially, automated audio and visual analysis is performed to extract audio and visual cues. Then, two different fusion approaches are deployed: ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003